Skip to content

Fix flaky KubernetesInformerCreatorTest on Windows due to shared semaphore race condition#4603

Draft
Copilot wants to merge 3 commits intodependabot/maven/software.amazon.awssdk-sts-2.41.32from
copilot/fix-github-actions-workflow
Draft

Fix flaky KubernetesInformerCreatorTest on Windows due to shared semaphore race condition#4603
Copilot wants to merge 3 commits intodependabot/maven/software.amazon.awssdk-sts-2.41.32from
copilot/fix-github-actions-workflow

Conversation

Copy link
Contributor

Copilot AI commented Feb 19, 2026

KubernetesInformerCreatorTest.informerInjection fails on Windows because both the pod and configmap watch stubs shared a single watchCount semaphore. On Windows, the pod informer can complete LIST→WATCH, receive the empty {} response, and immediately loop back to fire a second WATCH — all before the configmap informer makes its first WATCH request. Two pod-watch firings then satisfy watchCount.acquire(2), letting the test advance to verify() before the configmap watch ever happens.

Changes

  • Per-stub CountDownLatch(1) instead of shared Semaphore: replaced getCount/watchCount (shared across both informers) with four independent latches — podGetLatch, podWatchLatch, configMapGetLatch, configMapWatchLatch — each wired to its own stub
  • Idempotent signaling: CountDownLatch.countDown() is a no-op once at zero, so repeated firings from one informer cannot accidentally satisfy another informer's gate
  • Updated CountRequestAction: switched from Semaphore.release() to CountDownLatch.countDown()
  • List-before-watch ordering enforcement: each watch stub carries the corresponding get latch as a "prerequisite" parameter; CountRequestAction.doAction records an AssertionError in a static AtomicReference<Throwable> orderingViolation if a WATCH fires before the LIST has completed, and the test asserts that field is null at the end
// Before: shared semaphores — pod watch firing twice satisfies watchCount.acquire(2)
Semaphore watchCount = new Semaphore(2);
watchParams.put("semaphore", watchCount); // same params for BOTH pod and configmap stubs

// After: independent latches with ordering enforcement
CountDownLatch podGetLatch = new CountDownLatch(1);
CountDownLatch podWatchLatch = new CountDownLatch(1);
podWatchParams.put("semaphore", podWatchLatch);
podWatchParams.put("prerequisite", podGetLatch); // enforces list happens before watch
// ...
podGetLatch.await();
podWatchLatch.await();
assertThat(orderingViolation.get()).isNull();
Original prompt

Fix the failing GitHub Actions workflow Java 21.0.x Maven Test on windows-latest
Analyze the workflow logs, identify the root cause of the failure, and implement a fix.
Job ID: 64092008585
Job URL: https://github.com/kubernetes-client/java/actions/runs/22165572207/job/64092008585


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2026
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 19, 2026
…I failure

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Copilot
Once this PR has been reviewed and has the lgtm label, please ask for approval from brendandburns. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 19, 2026
Copilot AI changed the title [WIP] Fix failing GitHub Actions workflow for Java 21.0.x Maven Test Fix flaky KubernetesInformerCreatorTest on Windows due to shared semaphore race condition Feb 19, 2026
@brendandburns
Copy link
Contributor

@copilot the list should occur before the watch, maybe we should use the countdown latches to ensure that that is true.

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot the list should occur before the watch, maybe we should use the countdown latches to ensure that that is true.

Done in commit 1ccd363. Each watch stub now carries the corresponding get latch as a "prerequisite" parameter. In CountRequestAction.doAction, if the watch fires while the prerequisite latch still has count > 0 (meaning the list hasn't completed yet), an AssertionError is stored in a static AtomicReference<Throwable> orderingViolation. The test asserts that field is null at the end, so any list-after-watch ordering violation will cause an explicit failure with the message "watch request received before list was completed".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments